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In the future, human-like robots will live among people to provide company and help carrying 
out tasks in cooperation with humans. These interactions require that robots understand 
not only human actions, but also the way in which we perceive the world. Human perception 
heavily relies on the time dimension, especially when it comes to processing visual motion. 
Critically, human time perception for dynamic events is often inaccurate. Robots interacting 
with humans may want to see the world and tell time the way humans do: if so, they must 
incorporate human-like fallacy. Observers asked to judge the duration of brief scenes are 
prone to errors: perceived duration often does not match the physical duration of the event. 
Several kinds of temporal distortions have been described in the specialized literature. Here 
we review the topic with a special emphasis on our work dealing with time perception of 
animate actors versus inanimate actors. This work shows the existence of specialized 
time bases for different categories of targets. The time base used by the human brain to 
process visual motion appears to be calibrated against the specific predictions regarding the 
motion of human figures in case of animate motion, while it can be calibrated against the 
predictions of motion of passive objects in case of inanimate motion. Human perception 
of time appears to be strictly linked with the mechanisms used to control movements. 
Thus, neural time can be entrained by external cues in a similar manner for both perceptual 
judgments of elapsed time and in motor control tasks. One possible strategy could be to 
implement in humanoids a unique architecture for dealing with time, which would apply 
the same specialized mechanisms to both perception and action, similarly to humans. This 
shared implementation might render the humanoids more acceptable to humans, thus 
facilitating reciprocal interactions. 
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INTRODUCTION 

Robots are potentially very useful in several tasks where human 
resources may be limited or need to be spared, for example the 
assistance of elder people, care of children, physical therapy of 
disabled people, search and salvage of people in unsafe envi- 
ronments, or general help in daily life. These and similar tasks 
require a robot-human interaction, the interaction being proxi- 
mal when the two (or more) partners are co-located (service robots 
placed in the same locale as humans; Mortl et al, 2012), or remote 
when the partners are separated spatially and/or temporally (as 
in tele-operation; Goodrich and Schultz, 2007). In both cases, 
the interaction implies some sort of communication between the 
partners, and humanoid robots appear especially well suited to 
communication (e.g., Breazeal, 2003; Minato et al., 2004; Calinon 
etal., 2007). Humanoids are autonomous robots with anthropo- 
morphic features, capable of mimicking human-like actions, and 
producing human-like reasoning (Goodrich and Schultz, 2007; 
Schaal, 2007). 

Robot-human interactions present several formidable chal- 
lenges, some of which are listed below. On the one hand, there 



is the hope that, in the future, humanoids will be as much human- 
like as possible, in order to be able to interact with people in 
the most natural manner (Jarrasse et al., 2012). For instance, it has 
recently been shown that the presentation of a humanoid face trig- 
gers an automatic orientation of spatial attention in humans, just 
as it does the presentation of a human face (Chaminade and Okka, 
2013). On the other hand, paradoxically, the more human-like the 
appearance of a robot, the greater can be the social and emotional 
implications of its interaction with humans, because humans must 
accept the robot as an animate or quasi-living creature. As first 
hypothesized by Mori (1970), the sense of familiarity and general 
emotional response of a person who interacts with a robot may 
not increase monotonically with increasing anthropomorphism 
of the robot. At some point, the human reaction may suddenly 
become very negative when the robot closely but imperfectly 
reproduces a human being. Mori called this effect the "uncanny 
valley of eeriness." The effect has recently been quantified by 
applying signal detection theory to the display of different types 
of computer-animated figures (Chaminade etal., 2007). By mea- 
suring the response bias of human observers toward "biological" 
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or "artificial" categorization, it was found that the bias toward 
"biological" decreased with figures' anthropomorphism, consis- 
tent with the "uncanny valley" hypothesis. Moreover, imaging the 
brain during the presentation of the different figures showed that 
the "biological" bias correlates positively with activity in regions 
involved in social cognitive processes such as mentalizing activ- 
ity (e.g., the temporo-parietal junction; Chaminade etal., 2007). 
These findings therefore suggest that humans may not under- 
stand, feel empathy, and collaborate efficiently with humanoids 
which are highly anthropomorphic but are still perceived as arti- 
ficial. In this respect, it may be more crucial that humanoids 
behave in a human-like manner, rather than they resemble humans. 
Thus, Nisky etal. (2012) proposed a Turing-like test for assess- 
ing alternative styles of handshake performed by a machine. The 
test is administered through a telerobotic system in which an 
interrogator holds a robotic stylus and interacts with another 
party, human or artificial. The inability of a human interroga- 
tor to distinguish between the handshake performed by a person 
and that performed by the machine indicates that the machine 
behaves in a human-like manner. There also exist standardized 
questionnaires to measure human perception of anthropomor- 
phism, animacy, likeability, intelligence, and safety of robots 
(Bartneck etal, 2009). 

Currently, much attention is being paid toward endowing 
robots with human-like movement features, under the premise 
that humans will collaborate better with robots which move 
like humans. Indeed, some progress is been made in imple- 
menting human-like movements in some robots (Schaal, 2007; 
Sugimoto etal., 2012). Although the movements of most cur- 
rent robots are still a caricature of human movements attesting 
the difficulty of imitating us, a promising approach appears the 
application of the movement primitives extracted from human 
subjects (for instance, by means of principal component anal- 
ysis) to transfer the features of human movement to a robot 
(Choe etal., 2007; Moro etal., 2012). Even more challenging 
appears the task of endowing robots with the ability to under- 
stand the manner in which humans perceive the world, another 
critical prerequisite for cooperative interactions between robots 
and humans. In humans, action is strictly coupled to percep- 
tion. Because of substantial sensori-motor delays, most motor 
responses of humans cannot be simply reactive to a given external 
event, but must be somehow predictive, that is, the responses 
must incorporate knowledge about the forthcoming evolution 
of the event (Zago etal., 2009). In fact, it is known that 
perception and action share, at least in part, common repre- 
sentations and common knowledge (Prinz, 1997; Rizzolatti and 
Craighero, 2004; Choe etal., 2007). To accomplish shared tasks, 
robots and humans should interact knowing what each other 
is doing. Of course, robots could be endowed with their own, 
idiosyncratic knowledge-based perceptual system, but presum- 
ably they would interact more successfully with humans if they 
shared with humans a similar knowledge-based perceptual sys- 
tem, as well as temporal cognition (Maniadakis and Trahanias, 
2011). 

As remarked above, neural processing of sensory information 
is fraught with substantial delays (considerably longer than those 
typically present in robots), but the brain somehow compensates 



for them, so that we are unaware of constantly living in the past, 
so to speak (Nijhawan, 2008). Thus, neural responses lag behind 
the adequate visual stimulus by 50-100 ms in several visual cor- 
tical areas, including the primary visual cortex (Schmolesky etal., 
1998). The flash-lag effect is a visual illusion in which a flashed 
object appears to lag behind a moving object, when physically the 
two objects are co-localized at the instant of the flash (Nijhawan, 
1994). One explanation of the effect is that the visual system is 
predictive, accounting for neural delays by extrapolating the tra- 
jectory of the moving stimulus into the future (Nijhawan, 1994). 
Alternatively, however, visual awareness might be postdictive, so 
that the percept attributed to the time of an event is a function of 
what happened during the last 80 ms after the event (Eagleman 
and Sejnowski, 2000). 

Moreover, processing delays can differ significantly among 
different sensory channels: for instance, acoustic stimuli are pro- 
cessed much faster than visual stimuli. Nevertheless, when we see 
and hear someone snapping his or her fingers, we perceive the 
event as unitary. The sight and sound appear simultaneous, as 
if the brain synchronized internally the corresponding visual and 
auditory signals. 

Human perception is a vastly complex performance, but the 
temporal dimension is essentially ubiquitous because perceived 
actions and events unfold in time. Animals, people (and less fre- 
quently, inanimate objects) are seldom static, and our sensory 
landscape is typically dynamic, populated by moving targets. The 
critical point to be considered for implementing human-like per- 
ceptual abilities in robots is that human perception of elapsed 
time for actions and events is two-sided, being both quite pre- 
cise and quite inaccurate. In general, the precision (variable error) 
exhibited by humans in processing time information across an 
extremely large range of temporal intervals is striking. The Weber 
ratio is about 10% over 10 orders of magnitude of the base time 
interval, from the microsecond timing of sound localization to the 
24-h period of events evolving with a circadian rhythm (Gibbon, 
1977; Buhusi and Meek, 2005). On the other hand, the accu- 
racy (constant error) of estimates of the duration of events can 
be surprisingly poor, perceived duration often being very loosely 
related to the physical duration of the event. Subjective durations 
can be systematically overestimated (time dilation), or under- 
estimated (time compression), and the performance is highly 
context-dependent (Fraisse, 1963; Mauk and Buonomano, 2004; 
Eagleman etal, 2005; Eagleman, 2008). 

Here we briefly consider some examples of time distortions in 
human perception, and dwell more extensively on the special case 
of the effects of visual motion on subjective duration. Also, we 
will mainly discuss the perception of events unfolding over scales 
of tens to hundreds of milliseconds, because these time scales 
are common to typical motor actions. We will argue that human 
perception of time is strictly linked with the way humans control 
their own movements. Therefore, implementation of human-like 
perception in humanoids will also depend on the progress being 
made in implementing human-like motor control. 

DISTORTIONS OF PERCEIVED TIME 

Perceived duration is affected by several factors, as shown by the 
behavior in response to the presentation of simple visual stimuli. 
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When humans are asked to judge the duration of a flash, they often 
make systematic errors. Thus, a simple reduction in the visibility of 
a flash leads to underestimating its duration (Terao et al., 2008). In 
addition to luminance, also the numerosity and size of the stimuli 
affect time estimates: stimuli with larger magnitudes in these non- 
temporal dimensions are judged to be temporally longer (Xuan 
etal., 2007). 

Also the extent to which the stimulus can be predicted affects 
time perception (Ulrich etal., 2006). If a given stimulus is flashed 
repeatedly, the duration of the first stimulus appears longer than 
that of the successive stimuli (Pariyadath and Eagleman, 2007). By 
the same token, a stimulus which stands out as different from all 
the others in a series appears to last longer than the other stimuli, 
even though they all have the same physical duration (Tse etal, 
2004). Another well-known factor affecting perceived duration is 
represented by the amount of attention paid to the stimulus: the 
higher the level of attention, the longer is the perceived duration 
(Tse et al, 2004; New and Scholl, 2009). 

Another kind of distortion in time perception occurs when 
the stimulus is presented close in time to the execution of a 
movement performed by the observer. For instance, a visual 
stimulus flashed just after an eye saccadic movement appears 
to last longer than normal (Yarrow etal., 2001). On the other 
hand, duration judgments are compressed during eye saccades 
(Morrone etal., 2005). In the latter case, observers largely under- 
estimate the time interval elapsed between two brief visual stimuli 
which are flashed near in time to a saccade. Another exam- 
ple of distortion is represented by the apparent compression of 
the time epoch which has elapsed between the execution of a 
simple movement (such as a button press) and a subsequent 
event (such as a beep or flash; Haggard etal., 2002). Subjective 
duration of intervals filled with task-irrelevant events is longer 
than that of empty intervals, the increase depending on the 
complexity of the perceptual processing required by the event 
(Buffardi, 1971). 

In addition to those listed above, several other factors affect time 
perception, such as arousal and emotional levels (Hancock and 
Weaver, 2005), stimulus complexity (Roelofs and Zeeman, 1951; 
Schiffman and Bobko, 1974), concurrent task complexity (Macar, 
1996), and temporal uncertainty (Zakay, 1992). Some of these dis- 
tortions can be accounted for within the "counter/accumulator" 
model of time perception (Creelman, 1962; Fraisse, 1963; Treis- 
man, 1963; Gibbon, 1977; Brown, 1995). In the context of this 
conceptual model, internal pulses are generated, collected, and 
integrated during the presentation of a stimulus. The output of 
the counting process is then compared with memorized time 
representations to estimate the overall duration of a given time 
epoch. In this framework, an increment of the variable (e.g., 
size, luminance, novelty, or arousal) which is critical for time 
perception in a given task would lead to a transient increase 
in the rate of the internal clock. Consequently, the accumu- 
lator would sum a larger number of pulses in a given time 
epoch, and the stimulus duration would be judged accordingly 
longer. 

Also other models have been proposed to account for time 
distortions. In one such model, subjective duration parallels the 
amount of neural energy (or the total amount of neural activity) 



used to encode a stimulus (Pariyadath and Eagleman, 2007). In 
higher cortical areas, neuronal firing rate tends to decrease in 
response to repeated presentations of the stimuli, and this may 
explain why subjective duration is longer for the first than the sub- 
sequent stimuli in a row. Still another model posits that timing is a 
distributed process, being encoded by the spatio-temporal patterns 
of activity in multiple neural populations (Mauk and Buonomano, 
2004). A stimulus typically engages hundreds of excitatory and 
inhibitory neurons, and also triggers time-dependent processes 
(e.g., synaptic plasticity). As a consequence, the state of the neural 
network is different when another stimulus arrives slightly later. 
The difference in the network activity produced by the second and 
first stimulus may code for the time interval separating the two 
stimuli. 

PERCEIVED DURATION OF VISUAL MOTION 

Considerable progress has been made in the phenomenologi- 
cal knowledge in this field of research over the last few years 
(see Eagleman, 2008; Zago etal, 2011a). Not only is visual 
motion common in daily life, but it is also so salient that the 
changes over time of the visual stimuli may index the passage 
of time by themselves: how much time has passed can be deter- 
mined by counting these indices (Brown, 1995). This is closely 
related to the "counter/accumulator" model mentioned above. 
As one would expect from the application of this model, visual 
motion is typically associated with misperceptions of elapsed 
time. Thus, it is known that the perceived duration of a mov- 
ing stimulus is longer than that of a stationary stimulus having the 
same physical duration (Lhamon and Goldstone, 1975; Brown, 
1995; Kanai etal, 2006), and the apparent duration of the 
moving stimulus increases with increasing speed (Leiser etal, 
1991; Brown, 1995; Beckmann and Young, 2009; Kaneko and 
Murakami, 2009) . Indeed, according to the "counter/accumulator" 
model, faster stimuli would generate a greater number of 
events, and the longer would be the corresponding estimated 
duration. Also the specific kinematic profile of the moving 
target can affect temporal judgments. For instance, a constant- 
speed motion seems to last longer than a decelerating motion, 
which in turn seems to last longer than an accelerating motion 
(Matthews, 2011). 

The specific dynamic factor associated with visual motion 
which is responsible for the time distortion is still unclear. Accord- 
ing to one hypothesis, stimulus speed would be directly involved: 
the apparent duration would increase proportionally with the log- 
arithm of speed (Kaneko and Murakami, 2009). According to an 
alternative hypothesis, however, temporal frequency rather than 
speed would be the critical factor, as shown by the fact that time 
dilation can be induced simply by flickering a stimulus, with no 
need for motion (Kanai et al., 2006). 

In addition to the visual effects induced in real-time by a 
moving stimulus, there are also after-effects. For example, the 
prolonged exposure to a pattern moving at constant speed affects 
the perceived speed of subsequent moving patterns: the perceived 
speed of that stimulus and all slower speeds are reduced, while 
the perceived speed of faster stimuli is increased (Thompson, 
1981; Smith and Edgar, 1994; Hammett etal, 2005; Hietanen 
etal., 2008). These after-effects can be accounted for by current 
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models of speed processing. Perceived speed is thought to be 
based on the ratio of the outputs of low-pass and band-pass 
temporal filters, corresponding to a low- and high-speed chan- 
nel whose sensitivities decay exponentially over time (Smith and 
Edgar, 1994; Hammett etal., 2005). Adaptation to a fast speed 
produces a change in filters sensitivities resulting in a drop of the 
ratio, and perceived speed is slower. Instead, following adapta- 
tion to a slow speed, the change in filters sensitivities results in 
an increase of the ratio, and perceived speed is faster. Similar 
mechanisms are presumably at play in time perception. Thus, 
the apparent duration of a dynamic stimulus is reduced in a 
region of visual space following motion adaptation (Johnston 
et al., 2006), and the effect of this adaptation can be spatially selec- 
tive either in retinal (Bruno etal, 2010) or external coordinates 
(Burr etal, 2011). 

PERCEPTION IS TUNED TO DOMINANT PROPERTIES OF THE 
ENVIRONMENT 

Perceptual biases are not simply the result of idiosyncratic neural 
processing of sensory signals, but often reflect a priori hypoth- 
esis made by the brain about the functional significance of the 
signals. In particular, it is thought that, under evolutionary and 
developmental pressure, the brain adapts to be tuned to the 
statistical properties of the signals to which it is exposed most 
frequently (Simoncelli and Olshausen, 2001). For instance, the 
statistical distribution of target speeds in the natural environ- 
ment is skewed toward low values. A prior preference for slow 
speeds can result in severe misperceptions, as when the speed 
of a visual target is underestimated with small target size or low 
contrast. These misperceptions are accounted for by the fact that 
the noisier the signal (as with small, low-contrast targets), the 
greater is the influence of the prior assumption of low speed 
(Weiss etal, 2002). 
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FIGURE 1 | Discrimination of visual motion duration. In different trials, a 
target moved downward, upward, leftward, or rightward with constant 
acceleration (9.81 m s -2 ) and randomized initial speed, resulting in a 
variable total duration of motion. Observers judged whether the duration of 
the test stimulus was longer or shorter than a standard duration (800 ms). 
(A) Population psychometric functions for downward motion (blue) and 
upward motion (red). The graphs show the proportion of times the test 



Prior hypotheses about the environment can be revealed by 
the presence of illusions and misperceptions under unusual 
conditions, but their functional utility lies in the ability to improve 
the performance under ecological conditions. One such prior 
hypothesis concerns the ubiquitous and highly predictable effects 
of Earth's gravity (Zago et al., 2009). Gravity plays a major role in 
determining the orientation of objects in the environment, and 
therefore the structure of our visual field. Most natural images are 
anisotropic, with more image structure at orientations parallel or 
orthogonal to the direction of gravity in a fronto-parallel plane 
(Hansen and Essock, 2004). These image anisotropics are often 
matched by corresponding anisotropics in perceptual responses, 
consistent with the hypothesis that the brain takes into account 
the statistics of the environment. The well-known "oblique effect" 
refers to the fact that contours are better discriminated when 
they are oriented vertically or horizontally (cardinal directions) 
than when they are oriented obliquely (Appelle, 1972). Simi- 
larly, motion direction is better discriminated along cardinal than 
oblique axes (Ball and Sekuler, 1987). 

Recently, anisotropics related to the direction of motion have 
been described in a task of time perception (Moscatelli and Lac- 
quaniti, 2011). Observers were asked to judge the duration of 
motion of a target accelerating in one of four different directions, 
downward, upward, leftward, or rightward relative to a visual 
scene. Downward motion complied with the gravity constraint, 
whereas motion in the other directions violated this constraint. It 
was found that the precision of the duration estimates exhibited 
systematic anisotropics, the performance being significantly better 
for downward motion than for the other directions ( Figure 1 ) . The 
results demonstrated that prior knowledge about gravity force is 
incorporated in the neural mechanisms computing elapsed time. 
Similar mechanisms are at work when timing interception actions. 
Thus, Zago etal. (2011b) asked participants to press a button 
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stimulus appeared to last longer than the standard (data pooled over all 
participants). (B) For each motion direction, the precision of discrimination 
was assessed as the slope of the population response (all values 
normalized relative to the downward condition): the higher the slope, the 
greater the precision. Error bars: ± 1 SD. Significant differences: 
***p < 0.001 and *p < 0.05. Replotted with permission from Moscatelli 
and Lacquaniti (2011). 
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triggering a hitter to intercept a target accelerated by a virtual 
gravity. A factorial design assessed the effects of scene orientation 
(normal or inverted) and target gravity (normal or inverted, 
Figure 2). It was found that interception was significantly more 
successful when scene direction was concordant with target grav- 
ity direction, irrespective of whether both were upright or inverted 
(Figure 3). 

OBSERVATION OF BIOLOGICAL MOTION 

Humans have evolved to recognize and interpret the behavior 
of other humans so as to interact with them effectively. Spe- 
cialized mechanisms in the form of configural processing can 
help in the recognition process (Reed etal., 2012). For instance, 
changes in animate targets are detected faster than those in 
inanimate targets (New etal., 2007). Moreover, there is grow- 
ing evidence that, to deal with animate motion, the brain uses 
mechanisms partially different from those used to deal with the 
motion of inanimate objects. The neural networks processing 
animate and inanimate targets are partially segregated in the 
brain (Caramazza and Shelton, 1998). This specialization takes 
advantage of the fact that the kinematics and dynamics of ani- 
mals differs from those of passive objects on several counts 
(Zago etal., 2011a). 

Recently, the hypothesis of specialized processing of animate 
and inanimate targets has been extended to encompass the tem- 
poral domain (Carrozzo etal., 2010). Namely, the hypothesis 
holds that there exist distinct time bases for animate and inan- 
imate events. This specialization would enhance our ability to 



predict critically timed actions. When animacy is detected by 
a human observer, time is calibrated against the predictions 
regarding the motion of people and animals, allowing synchro- 
nization in inter-personal actions. When no animacy is detected, 
the time is calibrated against the predictions of motion of passive 
objects. This is consistent with the idea that time perception can 
be embodied, i.e., that affective and body states influence time 
judgments (Maniadakis and Trahanias, 2011). 

Consistent with this hypothesis, there is evidence that time 
perception and motor timing are influenced by animacy: the 
observation of a biological movement performed by other peo- 
ple biases the timing of a motor act or the judgment of perceived 
duration of an event (Watanabe, 2008; Bove et al, 2009; Carrozzo 
etal., 2010; Orgs and Haggard, 2011; Zago etal, 2011b,c; Mouta 
et al., 2012; Wang and Jiang, 2012; Carrozzo and Lacquaniti, 2013). 
In particular, Carrozzo etal. (2010) used interference paradigms 
in which a timing task was run concurrently with the presenta- 
tion of different figures animated with computer-graphics in the 
background of the scene (Figures 4A,B)- In separate experiments, 
they used two different timing tasks: (1) button-press responses 
aimed at intercepting a moving ball, and (2) discrimination of the 
duration of a stationary flash. The timing tasks served as probes 
to reveal biases or distortions of time induced by the background 
figures. In both tasks, the observers were presented with different 
background scenes before and during the execution of the task. 
The scene displayed figures which could differ in terms of biolog- 
ical (human) or non-biological appearance and kinematics. In all 
cases, the background figures and their movements were totally 
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FIGURE 2 | Scenes displayed in the manipulation of visual congruence 
between background and gravity orientation. The target ball was 
launched vertically from the launcher, hit the opposite surface and bounced 
back. The target decelerated from launch to bounce (blue trajectory), and it 
accelerated after bounce (red trajectory). Blue and red segments were not 
present in the actual movies. When the button was pressed, the standing 



character shot a bullet toward the interception point (indicated by the 
cross-hair). The direction of the scene ("s") and the direction of gravity acting 
on the target ("g") were varied in different blocks of trials: (A) normal scene 
and gravity, (B) normal scene and inverted target gravity, (C) inverted scene 
and gravity, (D) inverted scene and normal target gravity. Modified with 
permission from Zago etal. (2011 b). 
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FIGURE 3 | Success rate for each type of scene in the manipulation of 
visual congruence between background and gravity orientation. 

Brackets indicate that success rate was significantly (p < 0.05) higher for 
the congruent scenes (A,C in Figure 2) than for the incongruent ones (B,D). 
Modified with permission from Zago etal. (2011b). 



unrelated to the foreground target and to the viewer's action. 
Carrozzo et al. (2010) found that, for both the motor interception 
and the time discrimination task, there was a systematic offset 
between the time estimates associated with biological movements 
and the time estimates associated with non-biological movements, 
consistent with the hypothesis that there exist timing mechanisms 
differentially tuned to these two sets of movements. In another 
study, the speed of the movements of the background figures 
was varied across sessions, so that the motion speed of all the 
segments of the character was scaled up or down by the same 
amount and to the same extent for both the biological and the 
non-biological figure (Carrozzo and Lacquaniti, 2013). The results 
confirmed the existence of an offset between the time estimates 



associated with biological movements and the time estimates 
associated with non-biological movements (Figure 4). Moreover, 
animation speed affected time estimates very differently for the 
two categories of movement: increasing the speed of the whirligig 
increased the delay of the responses considerably, whereas the 
effect of the dancer's speed was weaker and in the opposite 
direction. 

These results indicate that vision of human and inanimate 
motions exerts differential top-down influences on automatic pro- 
cesses computing time. Interference effects are observed when the 
background motion is unrelated to the task performed by the 
observer. By contrast, when the observed action is related and 
instrumental to the task performance, the interaction between 
the two (observed and performed) actions results in facilitation 
rather than interference (Sebanz and Knoblich, 2009). In this 
vein, Zago etal. (2011c) compared the timing of interception of 
a moving target when it depended on a biological motion or a 
non-biological motion triggered by the observer and simulated 
on the computer screen. They found that the timing significantly 
improved in the presence of biological movements under all eco- 
logical conditions of coherence between scene and target gravity 
directions. Also, visual discrimination of point-light motion of 
two interacting agents is worse when the two actions are desyn- 
chronized (Neri etal., 2006). In other words, time-locking in a 
behaviorally meaningful way between interacting agents provides 
an implicit temporal cue and the additional agent can be used to 
predict the expected trajectory of the relevant agent with better 
precision. 

For biological motion, the correct timing of visual images is 
detected more accurately when motion flows in the normal for- 
ward direction. Thus, when muted video-clips of the lower face of 
speaking actors are shown at a variable rate, both faster and slower 
than the original rate, identification of the natural rate is accu- 
rate when the movies are played forward but not when they are 
played backward (Viviani et al., 201 la). Similarly, temporal rever- 
sals in dynamic displays of human locomotion are detected reliably 
only when they are played in the forward direction (Viviani et al., 





FIGURE 4 | Interference on timed responses by background motion of 
animate or inanimate figures. (A) One frame from a movie of a dancer. 
Motion was captured from a real dancer performing several steps of 
classical ballet, and then rendered using computer graphics. (B) One frame 
from a movie of a whirligig. This consisted of disjointed rods, whose 
angular motion matched that of the corresponding body segment of the 



dancer. In different sessions, dancer and whirligig movements could be 
played at the normal recorded speed, at slow or fast speeds (corresponding 
to 0.5 and 1.5 times the normal one, respectively). (C) Average (±95% 
confidence intervals over all participants) response times for the slow, 
normal, and fast speeds. Modified with permission from Carrozzo and 
Lacquaniti (2013). 
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2011b). Also these studies point to a specific tuning of time per- 
ception to biological movements. Implicit motor competence for 
the observed actions are presumably instrumental for extracting 
subtle discriminal information from the stimuli allowing correct 
temporal estimates. 

In addition to real motion, also apparent motion and implied 
motion can affect time estimates (Orgs and Haggard, 2011). In 
particular, static images of an action convey dynamic information 
about previous and subsequent moments of the same action, and 
provide an impression of motion. Images with implied motion 
cause a forward displacement in spatial memory - a phenomenon 
known as representational momentum (Freyd, 1983). Implied 
motion also affects perceived time, as assessed with classical psy- 
chophysical methods. The duration of a visual stimulus conveying 
implied motion information is discriminated more precisely than 
a similar stimulus without implied motion (Moscatelli et al., 201 1; 
Nather et al, 201 1). Also, visual stimuli with implied motion pro- 
duce time dilation just as real motion does (Nather etal, 2011; 
Yamamoto and Miura, 2012), although the distortion is smaller 
with implied motion. Indeed, when pictures depicting different 
sculptures of ballet dancers are shown, the duration is judged 
longer for the sculpture implying more movement than for the 
sculpture requiring less movement (Nather etal., 2011). 

Expertise leads to a fine tuning of timing abilities. Profes- 
sional pianists asked to reproduce the duration of visual displays 
outperform non-pianists when observing a specific action (a 
piano-playing hand) , but not when observing non-specific actions 
(finger-thumb opposition; Chen etal., 2013). This indicates that 
musical expertise involves a selective dynamic internal represen- 
tation that allows to estimate precisely the temporal duration of 
observed movements related to the expert performance. Similar 
results have been obtained by showing ballet steps to professional 
dancers: dancers were significantly less variable in their time esti- 
mations as compared to non-dancers (Sgouramani and Vatakis, 
2013). 

MOTOR TIMING 

According to one hypothesis, some of the processes involved in 
time perception, either a single internal clock, many special- 
ized clocks, or a distributed network representations of time, are 
also used for timing motor commands (Treisman etal, 1992). As 
actions must often be coordinated with external events, it seems 
advantageous to use a shared representation for time perception 
and motor timing. Behavioral observations, such as correlations 
between interval discrimination thresholds and variability in the 
timing of repetitive tapping (Keele et al., 1985; Ivry and Hazeltine, 
1995), similar interference patterns of sequences of auditory clicks 
at different frequencies on interval estimation and response time 
(Treisman et al., 1992), and significant transfer of training on a per- 
ceptual timing task to a motor timing task (Meegan etal, 2000), 
support the notion of shared timing mechanisms between percep- 
tion and motor control. Also the study by Carrozzo etal. (2010) 
offers supporting evidence for shared perceptuo-motor timing. 
This study showed that the effects of an animate context were 
similar for the explicit perceptual judgment of duration and for 
the manual interception of a moving target, as were the effects 
of an inanimate context. These results suggested that, in both 



an automatic form of motor timing and a cognitive form of time 
perception, the observers became tuned to a time base intrinsically 
linked to a background character. 

Imaging studies also suggest a shared neural substrate for per- 
ceptual and motor timing. For example, sustained perceptual 
analysis of auditorally and visually presented temporal patterns 
activates brain areas that are generally involved in motor prepa- 
ration and coordination (Schubotz etal., 2000). However, not all 
timing tasks share the same timing mechanisms. Thus, timing vari- 
ability is not correlated between repetitive tapping and continuous 
periodic drawing (Robertson et al., 1999; Zelaznik et al., 2002) and 
adaptation to visual motion may affect differently perception of 
interval duration and timing of anticipatory interceptive action 
(Marinovic and Arnold, 2012, but see Carrozzo et al., 2010). More- 
over, the motor system uses a state representation instead of a time 
representation during adaptation to mechanical perturbations to 
arm movements (Conditt and Mussa-Ivaldi, 1999; Karniel and 
Mussa-Ivaldi, 2003). 

On the timescale of a few hundreds of milliseconds, the percep- 
tion of time elapsed between events may be related to movement 
planning and to the representation of movement duration. Simple 
movements of different durations often show kinematic regu- 
larities suggesting that duration is controlled adjusting a small 
number of parameters. For example, the spatial trajectory of 
point-to-point reaching movements is independent of movement 
duration and its tangential velocity is invariant when normalized 
for speed (Soechting and Lacquaniti, 1981; Atkeson and Holler- 
bach, 1985). Because of the non-linearity of the musculo-skeletal 
system, invariant kinematic features across movements with differ- 
ent durations require significant variation in the muscle patterns. 
However, the muscle patterns underlying movement with differ- 
ent spatial and temporal characteristics can be generated by scaling 
in amplitude and time and by shifting in time a small number of 
time-varying muscle synergies, i.e., coordinated recruitments of 
group of muscles with specific activation profiles (d'Avella and 
Lacquaniti, 2013). Invariant trajectories and speed profiles can be 
achieved by scaling the amplitude of time-normalized dynamic 
torque profiles by the square of the inverse of the movement dura- 
tion (Hollerbach and Flash, 1982; Atkeson and Hollerbach, 1985), 
and similar scaling rules have been reported for time-varying 
muscle synergies (d'Avella et al, 2008). Thus, the control of move- 
ment duration may be achieved by setting the amplitude and 
the duration of a small number of time-varying muscle synergies 
(Figure 5). 

When it is necessary to synchronize a movement with an 
external event, its duration must be selected according to a predic- 
tion of the future time occurrence of the event. Such prediction 
requires an internal model of the dynamic behavior of the physical 
entity or animate character associated with the event. An internal 
model may be implemented explicitly through a representation 
of the relevant variables and a simulation of their time evolu- 
tion or, implicitly, as a mapping between sensory inputs and 
motor outputs generating the movement. In the latter case, a 
few spatio-temporal features of the sensory input may be directly 
mapped onto the amplitude and timing parameters modulating 
the recruitment of a few muscle synergies (see Figure 5; D'Andola 
etal., 2013). This strategy reduces the storage of the information 
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FIGURE 5 | Representation of timing for motor control and for movement duration and movement synchronization with external events, by a 

perceptual discrimination by synergy parameters. (A) Conceptual scheme direct mapping of sensory input onto synergy recruitment parameters. 

of the information processing stages for the control of motor timing, i.e., (Continued) 
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FIGURE 5 | Continued 

These stages are illustrated in the example of the control of an interceptive 
movement: proprioceptive input about the arm posture and visual input 
about the ball trajectory are combined with a priori knowledge of gravity to 
predict the time-to-contact between ball and hand. The appropriate 
interceptive movement is then planned in terms of synergy recruitment 
parameters which are used to generate motor commands by modulating in 
amplitude and timing a set of muscle synergies. (B) Left: an example of 
muscle patterns (EMGs) recorded during catching of a ball flying with three 
different flight durations (columns) captured by modulation in amplitude and 
timing of two time-varying muscle synergies (coordinated recruitment of 
groups of muscles with specific activation profiles; the average profile of 
each synergy is illustrated as a shaded area within a rectangle on the 
bottom). Synergy amplitude and synergy onset time (synergy parameters) 
are illustrated by the height and the left edge of the rectangles, respectively 
[adapted from DAndola etal. (2013)]. Notice that the onset of the first 
synergy is aligned with the ball launch and the onset of the second synergy 
with the impact of the ball with the hand. Right: a summary of the synergy 
onset timing for the two synergies (columns) with respect to launch time 
(top) and impact time (bottom) for six participants [adapted from D'Andola 
etal. (2013)]. (C) Conceptual scheme of a hypothetical interval duration 
discrimination process relying on short-term storage of synergy parameters 
for a planned movement synchronized with the events defining the intervals. 
Sensory input from the events defining the first interval are associated to 
synergy recruitment parameters for a motor plan synchronized with those 
events. These parameters are held in short-term memory until the 
parameters associated to the second interval are available for comparison. 



relevant for temporal estimates to a low-dimensional mapping 
between sensory and motor signals. Therefore, elapsed time may 
also be represented by the synergy recruitment parameters for a 
movement which is synchronized to the sensory stimuli related to 
an external event. To judge the duration of an interval, the CNS 
might prepare a motor plan triggered by the stimulus signaling the 
onset of the interval, and synchronized with the stimulus which 
indicates the end of the interval. According to this hypothesis, 
when a discrimination between different time intervals is required, 
only the few synergy recruitment parameters encoding the dura- 
tion of the motor plan associated to each interval would have to 
be compared. 

CONCLUSION AND PERSPECTIVES 

The work reviewed here represents only a small fragment of a 
vast literature. Nevertheless, it suffices to indicate a very com- 
plex organization of both explicit time perception and implicit 
time estimates in humans. On the one hand, there is growing 
evidence for specialized mechanisms for time encoding in the 
sub-second range. One important specialization we considered is 
related to the animate-inanimate or living-non-living distinction. 
This distinction is a basic one, because it arises early in infancy, is 
cross-culturally uniform, and is critical for causal interpretations 
of events. Specialization of the neural time estimates presumably 
enhances the temporal resolution of sensory processing and the 
ability to estimate the duration of critical events. On the other 
hand, we emphasized the possibility that, although time percep- 
tion is not unitary, there are some basic factors which can affect 
disparate time estimates in the same manner. Thus, we noticed 
that a neural time basis can be entrained by external cues in a sim- 
ilar manner for both perceptual judgments of elapsed time and 
in automatic motor control tasks. One possible reason underly- 
ing shared mechanisms for computing time is to be searched in 



the hypothesis that action observation involves an internal motor 
simulation of the observed movement (Prinz, 1997; Rizzolatti and 
Craighero, 2004; Choe etal., 2007). A motor resonance might 
derive from the synchronization of neural time to a base intrin- 
sically linked to the internal simulation of the observed action. 
Thus, human perception of time may be strictly linked with the 
mechanisms used by humans to control their movements. 

What is the relevance of all this for neurorobotics? Traditionally, 
the design and implementation of cognitive, sensory and motor 
abilities in robots depend on distinct fields of expertise. However, 
as we remarked at several points, the temporal dimension is shared 
by most sensory, motor, and cognitive tasks. One parsimonious 
solution, therefore, could be to implement in humanoids a unique 
architecture for dealing with time, which would apply the same 
specialized mechanisms to both perception and action, similarly 
to humans. There is the hope that this style of implementation 
might render the humanoids more acceptable to humans, thus 
facilitating reciprocal interactions. 

An interesting idea that is emerging in parallel from biology 
and machine intelligence is that sensorimotor behaviors can be 
constructed from primitives, the most basic components of behav- 
ior (Poggio and Bizzi, 2004; Moro etal, 2012). For instance, we 
noticed above that several motor behaviors of humans appear to 
be built starting from elementary muscle synergies (dAvella and 
Lacquaniti, 2013). There is also evidence that some such motor 
primitives are present at very early stages of human development, 
and they may be rooted in our evolutionary trajectory: indeed, 
these primitives appear to have been highly preserved and recom- 
bined during evolution (Dominici etal, 2011; Lacquaniti etal., 
2013). 

Choe etal. (2007) proposed that in robotics a developmen- 
tal program can be based on a small number of non-ad hoc, 
biologically grounded principles which can spontaneously and 
autonomously give rise to models and goals within the artificial 
agent. According to this approach, the agent might develop and 
learn by starting to use rudimentary, initial, stereotypical motor 
primitives (akin to those found in human development) which 
would allow the agent to understand its own internal states in 
terms of its own actions, possibly by keeping internal state invari- 
ance (Choe et al., 2007). This approach would probably ensure that 
the time dimension is dealt with in a similar fashion in sensory- 
perceptual and motor processes, the premise with which we began 
this article. 
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